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Abstract 


This document summarizes the history of HTML development, and defines 
the "text/html" MIME type by pointing to the relevant W3C 
recommendations; it is intended to obsolete the previous IETF 
documents defining HTML, including RFC 1866, RFC 1867, RFC 1980, RFC 
1942 and RFC 2070, and to remove HTML from IETF Standards Track. 


This document was prepared at the request of the W3C HTML working 
group. Please send comments to www-html@w3.org, a public mailing list 
with archive at <http://lists.w3.org/Archives/Public/www-html/>. 


1. Introduction and background 


HTML has been in use in the World Wide Web information infrastructure 
since 1990, and specified in various informal documents. The 
text/html media type was first officially defined by the IETF HTML 
working group in 1995 in [HIML20]. Extensions to HTML were proposed 
in [HTML30], [UPLOAD], [TABLES], [CLIMAPS], and [I18N]. 


The IETF HTML working group closed Sep 1996, and work on defining 
HTML moved to the World Wide Web Consortium (W3C). The proposed 
extensions were incorporated to some extent in [HTML32], and toa 
larger extent in [HTML40]. The definition of multipart/form-data from 
[UPLOAD] was described in [FORMDATA]. In addition, a reformulation of 
HTML 4.0 in XML 1.0[XHTML1] was developed. 
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[HTML32] notes "This specification defines HTML version 3.2. HTML 3.2 
aims to capture recommended practice as of early ’96 and as such to 
be used as a replacement for HTML 2.0 (RFC 1866)." Subsequent 
specifications for HTML describe the differences in each version. 


In addition to the development of standards, a wide variety of 
additional extensions, restrictions, and modifications to HTML were 
popularized by NCSA’s Mosaic system and subsequently by the 
competitive implementations of Netscape Navigator and Microsoft 
Internet Explorer; these extensions are documented in numerous books 
and online guides. 


2. Registration of MIME media type text/html 


MIME media type name: text 
MIME subtype name: html 
Required parameters: none 


Optional parameters: 


charset 
The optional parameter "charset" refers to the character 
encoding used to represent the HTML document as a sequence of 
bytes. Any registered IANA charset may be used, but UTF-8 is 
preferred. Although this parameter is optional, it is strongly 
recommended that it always be present. See Section 6 below for 
a discussion of charset default rules. 


Note that [HTML20] included an optional "level" parameter; in 
practice, this parameter was never used and has been removed from 
this specification. [HTML30] also suggested a "version" 
parameter; in practice, this parameter also was never used and has 
been removed from this specification. 


Encoding considerations: 
See Section 4 of this document. 


Security considerations: 
See Section 7 of this document. 


Interoperability considerations: 
HTML is designed to be interoperable across the widest possible 
range of platforms and devices of varying capabilities. However, 
there are contexts (platforms of limited display capability, for 
example) where not all of the capabilities of the full HTML 
definition are feasible. There is ongoing work to develop both a 
modularization of HTML and a set of profiling capabilities to 
identify and negotiate restricted (and extended) capabilities. 
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Due to the long and distributed development of HTML, current 
practice on the Internet includes a wide variety of HTML variants. 
Implementors of text/html interpreters must be prepared to be 
"bug-compatible" with popular browsers in order to work with many 
HTML documents available the Internet. 


Typically, different versions are distinguishable by the DOCTYPE 
declaration contained within them, although the DOCTYPE 
declaration itself is sometimes omitted or incorrect. 


Published specification: 
The text/html media type is now defined by W3C Recommendations; 
the latest published version is [HTML401]. In addition, [XHTML1] 
defines a profile of use of XHTML which is compatible with HTML 
4.01 and which may also be labeled as text/html. 


Applications which use this media type: 
The first and most common application of HTML is the World Wide 
Web; commonly, HTML documents contain URI references [URI] to 
other documents and media to be retrieved using the HTTP protocol 
[HTTP]. Many gateway applications provide HTML-based interfaces to 
other underlying complex services. Numerous other applications now 
also use HTML as a convenient platform-independent multimedia 
document representation. 


Additional information: 


Magic number: 
There is no single initial string that is always present for 
HTML files. However, Section 5 below gives some guidelines for 
recognizing HTML files. 


File extension: 
The file extensions ’html’ or /’htm’ are commonly used, but 
other extensions denoting file formats for preprocessing are 
also common. 


Macintosh File Type code: TEXT 
Person & email address to contact for further information: 


Dan Connolly <connolly@w3.org> 
Larry Masinter <lmm@acm.org> 
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Intended usage: COMMON 


Author/Change controller: 
The HIML specification is a work product of the World Wide Web 
Consortium’s HTML Working Group. The W3C has change control over 
the HTML specification. 


Further information: 
HTML has a means of including, by reference via URI, additional 
resources (image, video clip, applet) within the base document. In 
order to transfer a complete HTML object and the included 
resources in a single MIME object, the mechanisms of [MHTML] may 
be used. 


3. Fragment Identifiers 


The URI specification [URI] notes that the semantics of a fragment 
identifier (part of a URI after a "#") is a property of the data 
resulting from a retrieval action, and that the format and 
interpretation of fragment identifiers is dependent on the media type 
of the retrieval result. 


For documents labeled as text/html, the fragment identifier 
designates the correspondingly named element; any element may be 
named with the "id" attribute, and A, APPLET, FRAME, IFRAME, IMG and 
MAP elements may be named with a "name" attribute. This is described 
in detail in [HTML40] section 12. 


4. Encoding considerations 


Because of the availability within HTML itself for using character 
entity references, documents that use a wide repertoire of characters 
may still be represented using the US-ASCII charset and transported 
without encoding. However, transport of text/html using a charset 
other than US-ASCII may require base64 or quoted-printable encoding 
for 7-bit channels. 


As with all MIME text subtypes, the canonical form of "text/html" 
must always represent a line break as a sequence of a CR byte value 
(0x0D) followed by an LF (0x0A) byte value. Similarly, any 
occurrence of such a CRLF sequence in "text/html" must represent a 
line break. Use of CR byte values and LF byte values outside of line 
break sequences is also forbidden. This rule applies regardless of 
the character encoding (’charset’) involved. 
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Note, however, that the HTTP protocol allows the transport of data 
not in canonical form, and, in particular, with other end-of-line 

conventions; see [HTTP] section 3.7.1. This exception is commonly 

used for HTML. 


HTML sent via email is still subject to the MIME restrictions; this 
is discussed fully in [MHTML] Section 10. 


5. Recognizing HTML files 


Almost all HTML files have the string "<html" or "<HTML" near the 
beginning of the file. 


Documents conformant to HTML 2.0, HTML 3.2 and HTML 4.0 will start 
with a DOCTYPE declaration "<!DOCTYPE HTML" near the beginning, 


before the "<html". These dialects are case insensitive. Files may 
start with white space, comments (introduced by "<!--" ), or 
processing instructions (introduced by "<?") prior to the DOCTYPE 
declaration. 


XHTML documents (optionally) start with an XML declaration which 
begins with "<?xml" and are required to have a DOCTYPE declaration 
"<!DOCTYPE html". 


6. Charset default rules 


The use of an explicit charset parameter is strongly recommended. 
While [MIME] specifies "The default character set, which must be 


assumed in the absence of a charset parameter, is US-ASCII." [HTTP] 
Section 3.7.1, defines that "media subtypes of the ’text’ type are 
defined to have a default charset value of ’ISO-8859-1’". Section 


19.3 of [HTTP] gives additional guidelines. Using an explicit 
charset parameter will help avoid confusion. 


Using an explicit charset parameter also takes into account that the 
overwhelming majority of deployed browsers are set to use something 
else than ’ISO-8859-1’ as the default; the actual default is either a 
corporate character encoding or character encodings widely deployed 
in a certain national or regional community. For further 
considerations, please also see Section 5.2 of [HTML40]. 
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7. Security Considerations 


[HTML401], section B.10, notes various security issues with 
interpreting anchors and forms in HTML documents. 


In addition, the introduction of scripting languages and interactive 
capabilities in HTML 4.0 introduced a number of security risks 
associated with the automatic execution of programs written by the 
sender but interpreted by the recipient. User agents executing such 
scripts or programs must be extremely careful to insure that 
untrusted software is executed in a protected environment. 
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10. Full Copyright Statement 
Copyright (C) The Internet Society (2000). All Rights Reserved. 


This document and translations of it may be copied and furnished to 
others, and derivative works that comment on or otherwise explain it 
or assist in its implementation may be prepared, copied, published 
and distributed, in whole or in part, without restriction of any 
kind, provided that the above copyright notice and this paragraph are 
included on all such copies and derivative works. However, this 
document itself may not be modified in any way, such as by removing 
the copyright notice or references to the Internet Society or other 
Internet organizations, except as needed for the purpose of 
developing Internet standards in which case the procedures for 
copyrights defined in the Internet Standards process must be 
followed, or as required to translate it into languages other than 
English. 


The limited permissions granted above are perpetual and will not be 
revoked by the Internet Society or its successors or assigns. 


This document and the information contained herein is provided on an 
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 
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